Creating hidden Markov models for fast speech
نویسندگان
چکیده
This paper deals with the problem of building HMMs suitable for fast speech. Fast speech leads to increased error rates on various tasks. In the first part of the paper an automatic procedure is presented to split speech material into different categories according to the speaking rate, which is fundamental for all investigations on the speaking rate. In the second part the problem of sparse data available for the estimation of HMMs for fast speech is discussed. A comparison of different methods to overcome this problem follows. The main emphasis here is set on robust reestimation techniques like maximum aposteriori estimation (MAP) as well as on methods to reduce the variability of the speech signal and therefore to be able to reduce the number of HMM parameters. Vocaltract length normalization (VTLN) is chosen for that purpose. In the last part a comparison of various combinations of the methods discussed is presented basing on error rates for continuous speech recognition on fast speech. The best method (VTLN followed by MAP reestimation) results in an overall decrease of the error rate of 10% relative to the baseline system.
منابع مشابه
Speaker Independent Speech Recognition Using Hidden Markov Models for Persian Isolated Words
متن کامل
Speaker Independent Speech Recognition Using Hidden Markov Models for Persian Isolated Words
متن کامل
Speech enhancement based on hidden Markov model using sparse code shrinkage
This paper presents a new hidden Markov model-based (HMM-based) speech enhancement framework based on the independent component analysis (ICA). We propose analytical procedures for training clean speech and noise models by the Baum re-estimation algorithm and present a Maximum a posterior (MAP) estimator based on Laplace-Gaussian (for clean speech and noise respectively) combination in the HMM ...
متن کاملCreating hidden Markov models for fast speech by optimized clustering
Previous studies have shown that the recognition accu racy often severely degrades at higher speech rates which can basically be traced back to two main dimensions acoustic and phonemic Reasons for this e ect can be found in the phonemic eld e g elisions as well as on the acoustic level with increasing rates of speech the spec tral characteristics are changing A main obstacle in this context is...
متن کاملSpeaker normalization and pronunciation variant modeling: helpful methods for improving recognition of fast speech
The presented paper addresses the problem of creating hidden Markov models for fast speech. The major issues discussed are robust parameter estimation and reducing within-model variations. Regarding the first issue, the use of the maximum a posteriori parameter estimation is discussed. To reduce within-model variations, a maximum likelihood based vocal tract length normalization procedure and a...
متن کامل